Luke Sturgeon
Stat 517
Due: 10/29/18
To keep in the same tradition as .ipynb format their work, I will load all the packages and data that this problem needs at the beginning. It helps with analysis so that if I have to rerun a code block, I don’t have to take the extra time to install and load the package before working with the data. The actuall code is not shown because that takes too much space when we only have twenty pages to work with.
To see how the data is structured and how it generally behaves, we go through some basic exploratory data analysis.
qplot(x=Happiness.Score, y=Freedom, data=happy2015,main="Happiness Score by Freedom",xlab="Happiness Score",ylab="Measurement of Freedom",col=Region)
We see the general shape of the data in a 2D plot. We notice that, in terms of freedom, the countries do appear to roughly group themselves by region. Some regions have more “Freedom” than other, which correlates with a higher happiness rating. Since this is only a 2D model, we’re only getting a small slice of how the data is behaving. Next we run some PCA code, not to reduce a relatively small data set even further, but to see which 2 of the 6 continuous explanatory variables explain the most varation. Then we graph that data to get a better idea of how the data is behaving.
mclust2015 <- Mclust(happy2015[,c(6:12)])
pca2015DR <- MclustDR(mclust2015)
summary(pca2015DR)
## -----------------------------------------------------------------
## Dimension reduction for model-based clustering and classification
## -----------------------------------------------------------------
##
## Mixture model type: Mclust (VVE, 4)
##
## Clusters n
## 1 15
## 2 38
## 3 60
## 4 45
##
## Estimated basis vectors:
## Dir1 Dir2 Dir3 Dir4
## Economy..GDP.per.Capita. -0.1783485 0.38081007 -0.2179213 0.339409
## Family 0.1413050 -0.27236196 0.0456991 -0.127763
## Health..Life.Expectancy. 0.0748211 -0.73476920 -0.0519320 -0.247679
## Freedom -0.2430220 0.24178757 -0.1266290 0.407552
## Trust..Government.Corruption. 0.9065946 -0.13996341 0.4649210 0.442524
## Generosity 0.2481630 0.40356642 -0.8458748 -0.666722
## Dystopia.Residual -0.0082607 -0.00018282 -0.0058679 -0.026247
## Dir5 Dir6 Dir7
## Economy..GDP.per.Capita. -0.126566 0.24974 -0.153783
## Family 0.701175 0.18667 -0.090611
## Health..Life.Expectancy. -0.423914 -0.31245 0.022638
## Freedom 0.397669 -0.82955 0.786008
## Trust..Government.Corruption. -0.356183 0.21456 0.029938
## Generosity -0.146243 0.25902 -0.066355
## Dystopia.Residual -0.078983 0.06233 0.586958
##
## Dir1 Dir2 Dir3 Dir4 Dir5 Dir6 Dir7
## Eigenvalues 0.70825 0.67259 0.6315 0.495 0.27847 0.15837 0.082679
## Cum. % 23.39884 45.61971 66.4829 82.837 92.03641 97.26847 100.000000
We see that GDP per Capita and Family are the measures that explain the most variance. We’ll map Happiness Score against those.
plot3D <- plot_ly(happy2015, x = ~Happiness.Score, y = ~Family, z = ~Economy..GDP.per.Capita., color =~ Region) %>%
add_markers() %>%
layout(scene = list(xaxis = list(title = 'Happiness Score'),
yaxis = list(title = 'Family Measurement'),
zaxis = list(title = 'GDP')))
plot3D
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
Since the functionality of the “mclust” package is so diverse, we start this question with that package. From this package we will digress into k means clustering and hierarchical clustering to see what each method can tell us about the happiness data. The methods used will be repeated across all data sets, so we will see if there is a difference in the optimal number of “n components” for the mclust function, for example, for 2015, 2016, and 2017.
mclust can be used as a supervised or unsupervised learning method, the difference being whether or not we use the happiness rankings of each country. For unsupervised learning, we would want to ignore the happiness data and see what our models predict in terms of how the data is grouped. Once the model has grouped the data, we would go back and see how many countries the model categorized correctly.
print(summary(mclust2015))
## ----------------------------------------------------
## Gaussian finite mixture model fitted by EM algorithm
## ----------------------------------------------------
##
## Mclust VVE (ellipsoidal, equal orientation) model with 4 components:
##
## log.likelihood n df BIC ICL
## 443.0723 158 80 481.137 464.3402
##
## Clustering table:
## 1 2 3 4
## 15 38 60 45
plot(mclust2015, what = "BIC")
happy2015.scale <- scale(happy2015[,5:12])
fviz_nbclust(happy2015.scale,kmeans,'wss')
fviz_nbclust(happy2015.scale,kmeans,'silhouette')
The output of Mclust and the graph both show that clustering is optimized when we use 4 components. I suspect the members of these groups are located near each other spatially. Let’s see how closely Mclust gets to grouping the countries by region based on predicted happiness scores.
We now explore what model based clustering looks like within this dataset.
happy2015BIC <- mclustBIC(happy2015[,c(6:12)])
happy2015summary <- summary(happy2015BIC, data = happy2015[,c(6:12)], G = 4)
par(mfrow=c(1,2))
coordProj(data = happy2015[,c(6:12)], dimens = c(1,2), what = "classification",parameters = happy2015summary$parameters, z = happy2015summary$z)
coordProj(data = happy2015[,c(6:12)], dimens = c(1,2), what = "uncertainty", parameters = happy2015summary$parameters, z = happy2015summary$z)
happy2015.table.mclust <- table(happy2015$Country,mclust2015$classification)
happy2015.table.mclust <- as_data_frame(happy2015.table.mclust)
print(filter(happy2015.table.mclust, Var2==1,n==1))
## # A tibble: 15 x 3
## Var1 Var2 n
## <chr> <chr> <int>
## 1 Australia 1 1
## 2 Austria 1 1
## 3 Belgium 1 1
## 4 Canada 1 1
## 5 Denmark 1 1
## 6 Finland 1 1
## 7 Germany 1 1
## 8 Iceland 1 1
## 9 Ireland 1 1
## 10 Netherlands 1 1
## 11 New Zealand 1 1
## 12 Norway 1 1
## 13 Sweden 1 1
## 14 Switzerland 1 1
## 15 United Kingdom 1 1
print(filter(happy2015.table.mclust, Var2==2,n==1))
## # A tibble: 38 x 3
## Var1 Var2 n
## <chr> <chr> <int>
## 1 Bahrain 2 1
## 2 Bhutan 2 1
## 3 Cambodia 2 1
## 4 Djibouti 2 1
## 5 Dominican Republic 2 1
## 6 Georgia 2 1
## 7 Haiti 2 1
## 8 Hong Kong 2 1
## 9 Indonesia 2 1
## 10 Iran 2 1
## # ... with 28 more rows
print(filter(happy2015.table.mclust, Var2==3,n==1))
## # A tibble: 60 x 3
## Var1 Var2 n
## <chr> <chr> <int>
## 1 Albania 3 1
## 2 Algeria 3 1
## 3 Argentina 3 1
## 4 Armenia 3 1
## 5 Belarus 3 1
## 6 Bosnia and Herzegovina 3 1
## 7 Brazil 3 1
## 8 Bulgaria 3 1
## 9 Chile 3 1
## 10 China 3 1
## # ... with 50 more rows
print(filter(happy2015.table.mclust, Var2==4,n==1))
## # A tibble: 45 x 3
## Var1 Var2 n
## <chr> <chr> <int>
## 1 Afghanistan 4 1
## 2 Angola 4 1
## 3 Azerbaijan 4 1
## 4 Bangladesh 4 1
## 5 Benin 4 1
## 6 Bolivia 4 1
## 7 Botswana 4 1
## 8 Burkina Faso 4 1
## 9 Burundi 4 1
## 10 Cameroon 4 1
## # ... with 35 more rows
So we see in the first graph that the 4 groups are trying their best to capture everyone that’s in their group. The second graph shows the associated uncertainty of each observation belonging to that classification. The following graph shows the boundaries mclust has created for each category. Although these appear as hard lines, mclust uses a distribution to give each point a proportion of being within that category rather than assigning it a single category. The tibbles printed show a small selection of the countries that have been classified together. From these outputs, it looks like model based selection have put the countries in groups how we would expect them to be grouped; the more developed countries tend to be lumped together and the less developed among themselves as well.
We now explore what hierarchical clustering loos like with this dataset.
e <- dist(happy2015.scale,method="euclidian")
m <- dist(happy2015.scale, method="maximum")
man <- dist(happy2015.scale, method="manhattan")
b <- dist(happy2015.scale, method="binary")
hclust.w.b <- hclust(b, method="ward.D2")
hclust.w.m <- hclust(m, method="ward.D2")
hclust.w.man <- hclust(man, method="ward.D2")
hclust.w.e <- hclust(e, method="ward.D2")
plot(hclust.w.e)
rect.hclust(hclust.w.e,k=4,border = "red")
dend <- as.dendrogram(hclust.w.e)
dend <- dend %>% color_branches(k=4) %>% color_labels %>% set("branches_lwd", c(2,1,2,1)) %>% set("branches_lty", c(1,2,1,2))
labels <- as.numeric(labels(dend))
list <- c()
countries <- as.character(happy2015$Country)
for (i in 1:length(labels)){
list[i] <- countries[labels[i]]
}
labels(dend) <- list
circlize_dendrogram(dend)
As you can see from this dendrogram, it appears that the countries have been grouped roughly the same way as model based clustering as; countries with similar levels of development have been lumped together.
We see what k means clustering looks like within the data.
k4 <- kmeans(happy2015.scale,4)
k7 <- kmeans(happy2015.scale,7)
k10 <- kmeans(happy2015.scale,10)
plot(happy2015.scale, main="K-Means Classifications, k=4")
points(happy2015.scale,pch=k4$cluster+1, col=k4$cluster+1)
points(k4$centers, col=2:3, pch=2:3, cex=1.5)
k.table <- table(k4$cluster,happy2015$Country)
k.table <- as_data_frame(happy2015.table.mclust)
plot(happy2015.scale, main="K-Means Classifications, k=7")
points(happy2015.scale,pch=k7$cluster+1, col=k7$cluster+1)
points(k7$centers, col=2:3, pch=2:3, cex=1.5)
plot(happy2015.scale, main="K-Means Classifications, k=10")
points(happy2015.scale,pch=k10$cluster+1, col=k10$cluster+1)
points(k10$centers, col=2:3, pch=2:3, cex=1.5)
Tables from the kmeans model with 4 centers.
print(filter(k.table, Var2==1,n==1))
## # A tibble: 15 x 3
## Var1 Var2 n
## <chr> <chr> <int>
## 1 Australia 1 1
## 2 Austria 1 1
## 3 Belgium 1 1
## 4 Canada 1 1
## 5 Denmark 1 1
## 6 Finland 1 1
## 7 Germany 1 1
## 8 Iceland 1 1
## 9 Ireland 1 1
## 10 Netherlands 1 1
## 11 New Zealand 1 1
## 12 Norway 1 1
## 13 Sweden 1 1
## 14 Switzerland 1 1
## 15 United Kingdom 1 1
print(filter(k.table, Var2==2,n==1))
## # A tibble: 38 x 3
## Var1 Var2 n
## <chr> <chr> <int>
## 1 Bahrain 2 1
## 2 Bhutan 2 1
## 3 Cambodia 2 1
## 4 Djibouti 2 1
## 5 Dominican Republic 2 1
## 6 Georgia 2 1
## 7 Haiti 2 1
## 8 Hong Kong 2 1
## 9 Indonesia 2 1
## 10 Iran 2 1
## # ... with 28 more rows
print(filter(k.table, Var2==3,n==1))
## # A tibble: 60 x 3
## Var1 Var2 n
## <chr> <chr> <int>
## 1 Albania 3 1
## 2 Algeria 3 1
## 3 Argentina 3 1
## 4 Armenia 3 1
## 5 Belarus 3 1
## 6 Bosnia and Herzegovina 3 1
## 7 Brazil 3 1
## 8 Bulgaria 3 1
## 9 Chile 3 1
## 10 China 3 1
## # ... with 50 more rows
print(filter(k.table, Var2==4,n==1))
## # A tibble: 45 x 3
## Var1 Var2 n
## <chr> <chr> <int>
## 1 Afghanistan 4 1
## 2 Angola 4 1
## 3 Azerbaijan 4 1
## 4 Bangladesh 4 1
## 5 Benin 4 1
## 6 Bolivia 4 1
## 7 Botswana 4 1
## 8 Burkina Faso 4 1
## 9 Burundi 4 1
## 10 Cameroon 4 1
## # ... with 35 more rows
K means seems to have lumped the groups together pretty well again, although it seems group 2 and 3 have switched spots in terms of development of the countries.
The process listed above have been repeated for the 2016 and 2017 World Happiness datasets. This data analysis will help answer questions on whether the happiness has changed over time among groups.
mclust2016 <- Mclust(happy2016[,c(6:12)])
pca2016DR <- MclustDR(mclust2016)
summary(pca2016DR)
## -----------------------------------------------------------------
## Dimension reduction for model-based clustering and classification
## -----------------------------------------------------------------
##
## Mixture model type: Mclust (VEE, 3)
##
## Clusters n
## 1 22
## 2 72
## 3 63
##
## Estimated basis vectors:
## Dir1 Dir2 Dir3 Dir4
## Upper.Confidence.Interval -0.160674 -0.034311 0.2648896 -0.076094
## Economy..GDP.per.Capita. -0.205553 0.175372 -0.4056833 0.083286
## Family 0.055971 -0.021210 0.0041557 0.053468
## Health..Life.Expectancy. -0.576176 -0.311691 -0.4634187 0.305161
## Freedom 0.305937 0.040199 0.1523665 0.083617
## Trust..Government.Corruption. -0.534179 0.863909 -0.6324647 -0.524176
## Generosity 0.466771 0.350018 0.3567256 0.780730
## Dir5 Dir6 Dir7
## Upper.Confidence.Interval 0.098529 0.0022748 -0.056556
## Economy..GDP.per.Capita. 0.175629 -0.3823647 0.200075
## Family -0.267768 0.2338001 0.674472
## Health..Life.Expectancy. -0.349742 0.4942686 -0.545396
## Freedom -0.793034 -0.4914383 -0.432670
## Trust..Government.Corruption. -0.025323 0.5597458 0.016135
## Generosity 0.368622 0.0011255 0.130150
##
## Dir1 Dir2 Dir3 Dir4 Dir5 Dir6 Dir7
## Eigenvalues 0.59996 0.40894 0.2022 0.17008 0.14729 0.14577 0.12061
## Cum. % 33.42667 56.21060 67.4761 76.95234 85.15860 93.28034 100.00000
plot3D <- plot_ly(happy2016, x = ~Happiness.Score, y = ~Family, z = ~Economy..GDP.per.Capita., color =~ Region) %>%
add_markers() %>%
layout(scene = list(xaxis = list(title = 'Happiness Score'),
yaxis = list(title = 'Family Measurement'),
zaxis = list(title = 'GDP')))
plot3D
print(summary(mclust2016))
## ----------------------------------------------------
## Gaussian finite mixture model fitted by EM algorithm
## ----------------------------------------------------
##
## Mclust VEE (ellipsoidal, equal shape and orientation) model with 3
## components:
##
## log.likelihood n df BIC ICL
## 381.6596 157 53 495.3382 480.3675
##
## Clustering table:
## 1 2 3
## 22 72 63
plot(mclust2016, what = "BIC")
happy2016.scale <- scale(happy2016[,5:12])
fviz_nbclust(happy2016.scale,kmeans,'wss')
fviz_nbclust(happy2016.scale,kmeans,'silhouette')
happy2016BIC <- mclustBIC(happy2016[,c(6:12)])
happy2016summary <- summary(happy2016BIC, data = happy2016[,c(6:12)], G = 3)
par(mfrow=c(1,2))
coordProj(data = happy2016[,c(6:12)], dimens = c(1,2), what = "classification",parameters = happy2016summary$parameters, z = happy2016summary$z)
coordProj(data = happy2016[,c(6:12)], dimens = c(1,2), what = "uncertainty", parameters = happy2016summary$parameters, z = happy2016summary$z)
happy2016.table.mclust <- table(happy2016$Country,mclust2016$classification)
happy2016.table.mclust <- as_data_frame(happy2016.table.mclust)
print(filter(happy2016.table.mclust, Var2==1,n==1))
## # A tibble: 22 x 3
## Var1 Var2 n
## <chr> <chr> <int>
## 1 Australia 1 1
## 2 Austria 1 1
## 3 Bahrain 1 1
## 4 Belgium 1 1
## 5 Canada 1 1
## 6 Denmark 1 1
## 7 Finland 1 1
## 8 Germany 1 1
## 9 Hong Kong 1 1
## 10 Ireland 1 1
## # ... with 12 more rows
print(filter(happy2016.table.mclust, Var2==2,n==1))
## # A tibble: 72 x 3
## Var1 Var2 n
## <chr> <chr> <int>
## 1 Albania 2 1
## 2 Algeria 2 1
## 3 Argentina 2 1
## 4 Armenia 2 1
## 5 Azerbaijan 2 1
## 6 Belarus 2 1
## 7 Belize 2 1
## 8 Bolivia 2 1
## 9 Bosnia and Herzegovina 2 1
## 10 Brazil 2 1
## # ... with 62 more rows
print(filter(happy2016.table.mclust, Var2==3,n==1))
## # A tibble: 63 x 3
## Var1 Var2 n
## <chr> <chr> <int>
## 1 Afghanistan 3 1
## 2 Angola 3 1
## 3 Bangladesh 3 1
## 4 Benin 3 1
## 5 Bhutan 3 1
## 6 Botswana 3 1
## 7 Burkina Faso 3 1
## 8 Burundi 3 1
## 9 Cambodia 3 1
## 10 Cameroon 3 1
## # ... with 53 more rows
There happens to be no “Region” column in the 2017 data, so a little more preprocessing has to be done with this dataset than the others.
region2016 <- happy2016[,1:2]
happy2017 <- inner_join(region2016,happy2017)
## Warning: Column `Country` joining factors with different levels, coercing
## to character vector
mclust2017 <- Mclust(happy2017[,c(6:12)])
pca2017DR <- MclustDR(mclust2017)
summary(pca2017DR)
## -----------------------------------------------------------------
## Dimension reduction for model-based clustering and classification
## -----------------------------------------------------------------
##
## Mixture model type: Mclust (VEE, 3)
##
## Clusters n
## 1 15
## 2 61
## 3 74
##
## Estimated basis vectors:
## Dir1 Dir2 Dir3 Dir4
## Whisker.low -0.08450 0.149450 0.0034663 0.133739
## Economy..GDP.per.Capita. 0.26474 -0.182442 -0.1310164 -0.047661
## Family -0.15286 -0.062469 0.0011667 -0.322320
## Health..Life.Expectancy. -0.65048 -0.224081 0.0371653 -0.316534
## Freedom 0.14329 -0.076132 0.0874688 0.313591
## Generosity 0.31868 0.014686 -0.8505273 -0.089028
## Trust..Government.Corruption. 0.59514 0.940347 0.5003924 -0.818226
## Dir5 Dir6 Dir7
## Whisker.low -0.124942 0.036717 0.031008
## Economy..GDP.per.Capita. -0.015584 -0.705239 0.222413
## Family 0.253350 0.437777 0.417032
## Health..Life.Expectancy. 0.169880 0.391270 -0.813439
## Freedom 0.927377 -0.340893 -0.223943
## Generosity -0.071836 0.084268 -0.029470
## Trust..Government.Corruption. -0.160975 0.182317 -0.250903
##
## Dir1 Dir2 Dir3 Dir4 Dir5 Dir6
## Eigenvalues 0.55644 0.32718 0.31721 0.24138 0.20769 0.20389
## Cum. % 27.55042 43.74976 59.45560 71.40657 81.68969 91.78472
## Dir7
## Eigenvalues 0.16593
## Cum. % 100.00000
plot3D <- plot_ly(happy2017, x = ~Happiness.Score, y = ~Family, z = ~Economy..GDP.per.Capita., color =~ Region) %>%
add_markers() %>%
layout(scene = list(xaxis = list(title = 'Happiness Score'),
yaxis = list(title = 'Family Measurement'),
zaxis = list(title = 'GDP')))
plot3D
print(summary(mclust2017))
## ----------------------------------------------------
## Gaussian finite mixture model fitted by EM algorithm
## ----------------------------------------------------
##
## Mclust VEE (ellipsoidal, equal shape and orientation) model with 3
## components:
##
## log.likelihood n df BIC ICL
## 386.5254 150 53 507.4872 496.3558
##
## Clustering table:
## 1 2 3
## 15 61 74
plot(mclust2017, what = "BIC")
happy2017.scale <- scale(happy2017[,5:12])
fviz_nbclust(happy2017.scale,kmeans,'wss')
fviz_nbclust(happy2017.scale,kmeans,'silhouette')
happy2017BIC <- mclustBIC(happy2017[,c(6:12)])
happy2017summary <- summary(happy2017BIC, data = happy2017[,c(6:12)], G = 3)
par(mfrow=c(1,2))
coordProj(data = happy2017[,c(6:12)], dimens = c(1,2), what = "classification",parameters = happy2017summary$parameters, z = happy2017summary$z)
coordProj(data = happy2017[,c(6:12)], dimens = c(1,2), what = "uncertainty", parameters = happy2017summary$parameters, z = happy2017summary$z)
happy2017.table.mclust <- table(happy2017$Country,mclust2017$classification)
happy2017.table.mclust <- as_data_frame(happy2017.table.mclust)
print(filter(happy2017.table.mclust, Var2==1,n==1))
## # A tibble: 15 x 3
## Var1 Var2 n
## <chr> <chr> <int>
## 1 Australia 1 1
## 2 Austria 1 1
## 3 Belgium 1 1
## 4 Canada 1 1
## 5 Denmark 1 1
## 6 Finland 1 1
## 7 Germany 1 1
## 8 Ireland 1 1
## 9 Luxembourg 1 1
## 10 Netherlands 1 1
## 11 New Zealand 1 1
## 12 Norway 1 1
## 13 Sweden 1 1
## 14 Switzerland 1 1
## 15 United Kingdom 1 1
print(filter(happy2017.table.mclust, Var2==2,n==1))
## # A tibble: 61 x 3
## Var1 Var2 n
## <chr> <chr> <int>
## 1 Afghanistan 2 1
## 2 Angola 2 1
## 3 Azerbaijan 2 1
## 4 Bahrain 2 1
## 5 Bangladesh 2 1
## 6 Belize 2 1
## 7 Benin 2 1
## 8 Bhutan 2 1
## 9 Botswana 2 1
## 10 Burkina Faso 2 1
## # ... with 51 more rows
print(filter(happy2017.table.mclust, Var2==3,n==1))
## # A tibble: 74 x 3
## Var1 Var2 n
## <chr> <chr> <int>
## 1 Albania 3 1
## 2 Algeria 3 1
## 3 Argentina 3 1
## 4 Armenia 3 1
## 5 Belarus 3 1
## 6 Bolivia 3 1
## 7 Bosnia and Herzegovina 3 1
## 8 Brazil 3 1
## 9 Bulgaria 3 1
## 10 Chile 3 1
## # ... with 64 more rows
We use seriation methods with these datasets to see if the order of rankings is what the data suggests or not.
d <- dist(happy2015.scale)
ser <- seriate(d)
ser
## object of class 'ser_permutation', 'list'
## contains permutation vectors for 1-mode data
##
## vector length seriation method
## 1 158 Spectral
pimage(d,main="Random")
pimage(d,ser,main="Reordered")
It doesn’t look like this method of seriation is working very well. We’ll try other methods. If that doesn’t fix the problem, perhaps scaling the data is messing with our ouputs.
methods <- c("TSP", "R2E", "HC", "GW", "OLO", "ARSA")
ser <- sapply(methods, FUN = function(m) seriate(d,m))
ser <- ser_align(ser)
for(s in ser) pimage(d,s, main=get_method(s), key=FALSE)
crit <- sapply(ser, FUN = function(x) criterion(d, x))
t(crit)
## 2SUM AR_deviations AR_events BAR Cor_R Gradient_raw
## TSP 21516315 525424.3 528931 222867.4 0.06428266 232050
## R2E 23338014 649569.3 611291 220773.3 0.07215966 67330
## HC 20674627 407275.8 469413 219733.1 0.08358806 351086
## GW 20711274 445327.5 480312 223417.1 0.07909177 329288
## OLO 19815270 294589.0 389894 211667.9 0.10714782 510124
## ARSA 18412276 144742.5 243107 205173.1 0.15062226 803698
## Gradient_weighted Inertia Lazy_path_length Least_squares LS
## TSP 460790.3 425871447 20281.97 93667780 9576130
## R2E 179077.7 387198720 34600.52 94043397 9763938
## HC 679114.5 451020496 23937.95 93376682 9430581
## GW 590985.9 442222184 21981.73 93494186 9489333
## OLO 924874.5 470901987 20466.98 93049002 9266740
## ARSA 1302596.5 507478570 34269.95 92545372 9014926
## ME Moore_stress Neumann_stress Path_length RGAR
## TSP 2876.963 1066.995 578.1155 254.6268 0.4100520
## R2E 2767.016 1728.370 967.9409 432.5894 0.4739013
## HC 2849.335 1247.282 675.1554 309.3566 0.3639109
## GW 2868.269 1113.763 606.9756 277.6768 0.3723603
## OLO 2875.492 1048.531 575.7316 260.3551 0.3022640
## ARSA 2811.401 1463.608 837.4127 403.2268 0.1884679
The graphs appear to be misleading when compared to the output of the seriation methods. Six methods have been compared. The graphs for TSP appears to be one of the fuzzier; rather than reordering the data so that the objects are as close to the diagonal as possible, it appears to have not permutated the data much at all. The output shows that the Hamiltonian distance of the observations from the diagnoal have been minimized though. For this reason, we will use this method when picking which order the data suggest the countries should be.
ser.order <- seriate(d,method = "TSP")
country.order <- get_order(ser.order)
list <- c()
countries <- as.character(happy2015$Country)
for (i in 1:length(country.order)){
list[i] <- countries[country.order[i]]
}
print(head(list))
## [1] "Syria" "Myanmar" "Haiti"
## [4] "Somaliland region" "Laos" "Cambodia"
tail(list)
## [1] "Luxembourg" "Singapore" "Qatar" "Hong Kong" "Rwanda"
## [6] "Georgia"
It looks like the start of our list tends to be the less developed countries, so I would assume it’s listed the countries with the least happy first. The methods are repeated for the 2016 and 2017 data.
d16 <- dist(happy2015.scale)
ser <- seriate(d16)
ser
## object of class 'ser_permutation', 'list'
## contains permutation vectors for 1-mode data
##
## vector length seriation method
## 1 158 Spectral
pimage(d16,main="Random")
pimage(d16,ser,main="Reordered")
ser.order <- seriate(d,method = "TSP")
country.order <- get_order(ser.order)
list <- c()
countries <- as.character(happy2016$Country)
for (i in 1:length(country.order)){
list[i] <- countries[country.order[i]]
}
print(head(list))
## [1] "Afghanistan" "Mauritania" "Georgia" "Greece"
## [5] "Burkina Faso" "Malawi"
tail(list)
## [1] "Kyrgyzstan" "Zimbabwe" "Madagascar" NA
## [5] "Burundi" "South Africa"
d17 <- dist(happy2017.scale)
ser <- seriate(d17)
ser
## object of class 'ser_permutation', 'list'
## contains permutation vectors for 1-mode data
##
## vector length seriation method
## 1 150 Spectral
pimage(d17,main="Random")
pimage(d17,ser,main="Reordered")
ser.order <- seriate(d,method = "TSP")
country.order <- get_order(ser.order)
list <- c()
countries <- as.character(happy2017$Country)
for (i in 1:length(country.order)){
list[i] <- countries[country.order[i]]
}
print(head(list))
## [1] "Latvia" "Venezuela" "Trinidad and Tobago"
## [4] "Mauritius" "Chile" "Brazil"
tail(list)
## [1] "Uruguay" "Somalia" NA "Haiti" NA "Mali"